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Abstract 

Associated kernels have been introduced to improve the classical continuous ker¬ 
nels for smoothing any functional on several kinds of supports such as bounded 
continuous and discrete sets. This work deals with the effects of combined associ¬ 
ated kernels on nonparametric multiple regression functions. Using the Nadaraya- 
Watson estimator with optimal bandwidth matrices selected by cross-validation 
procedure, different behaviours of multiple regression estimations are pointed 
out according the type of multivariate associated kernels with correlation or not. 
Through simulation studies, there are no effect of correlation structures for the 
continuous regression functions and also for the associated continuous kernels; 
however, there exist really effects of the choice of multivariate associated kernels 
following the support of the multiple regression functions bounded continuous or 
discrete. Applications are made on two real datasets. 
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1. Introduction 

Considering the relation between a response variable Y and a d-vector {d > 1) 
of explanatory variables x given by 

y = m{x) + e, (1.1) 
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where m is the unknown regression function from c to R and e the dis¬ 
turbance term with null mean and finite variance. Let (Xi, Yi(X„, y„) be 
a sequence of independent and identically distributed (iid) random vectors on 
Td X R(c R‘^+^) with m(x) = E (Y|X = x) of (1.1). The Nadaraya (1964) and Watson 
(1964) estimator of m, using continuous classical (symmetric) kernels is 


n 

i=l 


y,r{h-i(x-xo) 

j:uk{h-\x-x,)} 


mn{x; H), Vx G := R^', 


( 1 . 2 ) 


where H is the symmetric and positive definite bandwidth matrix of dimension 
d X d and the function K{-) is the multivariate kernel assumed to be spherically 
symmetric probability density function. Since the choice of the kernel K is not im¬ 
portant in classical case, we use the common notation mn{x; H) for classical kernel 
regression. The multivariate classical kernel (e.g. Gaussian) suits only for regres¬ 
sion functions on unbounded supports (i.e. R'^); see also Scott (1992). Racine and 
Li (2004) proposed product of kernels composed by univariate Gaussian kernels 
for continuous variables and Aitchison and Aitken (1976) kernels for categorical 
variables; see also Hayfield and Racine (2007) for some implementations and uses 
of these multiple kernels. Notice that the use of symmetric kernels gives weights 
outside variables with unbounded supports. In the univariate continuous case, 
Ghen (1999, 2000ab) is one of the pioneers who has proposed asymmetric kernels 
(i.e. beta and gamma) which supports coincide with those of the functions to be 
estimated. Zhang (2010) and Zhang and Karunamuni (2010) studied the perfor¬ 
mance of these beta and gamma kernel estimators at the boundaries in comparison 
with those of the classical kernels. Recently, Libengue (2013) investigated several 
families of these univariate continuous kernels that he called univariate associated 
kernels; see also Kokonendji et al. (2007), Kokonendji and Senga Kiesse (2011), 
Zougab et al. (2012) and Wansouwe et al. (2014) for univariate discrete situations. 
A continuous multivariate version of these associated kernels have been studied 
by Kokonendji and Some (2015) for density estimation. 

The main goal of this work is to consider multivariate associated kernels and 
then to investigate the importance of their choice in multiple regression. These 
associated kernels are appropriated for both continuous and count explanatory 
variables. In fact, in order to estimate the regression function m in (1.1), we pro¬ 
pose multiple (or product of) associated kernels composed by univariate discrete 
associated kernels (e.g. binomial, discrete triangular) and continuous ones (e.g. 
beta, Epanechnikov). We will also use a bivariate beta kernel with correlation 
structure. Another motivation of this work is to investigate the effect of correla¬ 
tion structure for explanatory variables in continuous regression estimation. These 
associated kernels suit for this situation of mixing axes as they fully respect the 
support of each explanatory variable. In other words, we will measure the effect 
of type of associated kernels, denoted k, in multiple regression by simulations and 
applications. 
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The rest of the paper is organized as follows. Section 2 gives a general defi¬ 
nition of multivariate associated kernels which includes the continuous classical 
symmetric and the multiple composed by univariate discrete and continuous. For 
each definition, the corresponding kernel regression appropriated for both con¬ 
tinuous and discrete explanatory variables are given. In Section 3, we explore the 
importance of the choice of appropriated associated kernels according to the sup¬ 
port of the variables through simulations studies and real data analysis. Finally, 
summary and final remarks are drawn in Section 4. 


2. Multiple regression by associated kernels 

2.1. Definition 

In order to include both discrete and continuous regressors, we assume is 
any subset of IR'^. More precisely, for j = 1,..., n, let us consider on 
the measure v = vi (8)... (8) where Vj is a Lesbesgue or count measure on the 
corresponding univariate support Under these assumptions, the associated 
kernel which replaces the classical kernel K{-) of (1.2) is a probability density 
function (pdf) in relation to a measure v. This kernel can be defined as 

follows. 

Definition 2.1. Let be the support of the regressors, x e T^i a target vector 

and H a bandwidth matrix. A parametrized pdf Kx,h{-) of support Sx,h IR'^) is called 
"multivariate (or general) associated kernel" if the following conditions are satisfied: 

X s Sx,H/ (2.1) 

lE(Zx,H) = x + a(x,H), (2.2) 

Cov(Zx,h) = B(x,H), (2.3) 

where 1 Zx,h denotes the random vector with pdfK,c,H and both a(x, H) = (fli(x, H),..., ad{x, H))^ 
and tend, respectively, to the null vector 0 and the null matrix 

Od as H goes to 0^. 

From this definition and in comparison with (1.2), the Nadaraya-Watson estimator 
using associated kernels is 

m„(x; Kx,h) = f K, H), Vx G T, c R'^, (2.4) 

Li=i Rx.h 

where H = H„ is the bandwidth matrix such that H„ —> 0 as n —> oo, and k 
represents the type of associated kernel Kx,h, parametrized by x and H. With¬ 
out loss of generality and to point out the effect of k, we will in hereafter use 
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m„{x; K, H) = m„{x-,K) since the bandwidth matrix is here investigated only by 
cross validation. 


The following two examples provide the well-known and also interesting par¬ 
ticular cases of multivariate associated kernel estimators. The first can be seen 
as an interpretation of classical associated kernels through continuous symmetric 
kernels. The second deals on non-classical associated kernels without correlation 
structure. 

Given a target vector x G =: and a bandwidth matrix H, it follows that 

the classical kernel in (1.2) with null mean vector and covariance matrix L induces 
the so-called (multivariate) classical associated kernel: 

(0 Kx,h(-)= (2.5) 

on Sx,H = X - HSrf with E (.Z^x.h) = x (i.e. a(x, H) = 0) and Cov (.Z^x,h) = HLH; 

onSx,H = withE(J3x,H) = x (i.e. a(x, H) = 0) and Cov (JZ^x,h) = 

A second particular case of Definition 2.1, appropriate for both continuous and 
count explanatory variables without correlation structure is presented as follows. 

Let X = (xi,.. .,Xdy e =: and H = Diag(lzii,..., hid) with hjj > 0. Let 

be a (discrete or continuous) univariate associated kernel (see Definition 2.1 

for d = 1) with its corresponding random variable on Sx.^;,.^(c ]R) for all 

j = 1,... ,d. Then, the multiple associated kernel is also a multivariate associated 
kernel: 

d 

Kx,„(-)=n^=Ao p.6) 

;=i 


onSx,H = x^^^Sx^,^,^withE(Zx,H) = {xi + ai{x^,hn), ■ ■ ■ ,Xd + ad{Xd,hdd)y andCov {Zx,h) 

= Diag {bjjixj, hjj)^ In other words, the random variables 
dent components of the random vector Zx,h- 

Here, in addition to the Nadar ay a-Watson estimator using general associated 
kernels given in (2.4), we proposed a slight one. In fact, for multivariate supports 
composed of continuous and discrete univariate support, we lack appropriate 
general associated kernels. Therefore, the estimator (2.4) becomes with multiple 
associated kernels (2.6): 



n 


Mnix; k) = 2_^ 




riiinjxiK'y/x,,) 


Vx = (xi,.. .,XdV e Td := (2.7) 
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In theory and in practice, one often uses (2.7) from multiple associated kernels (2.6) 
which are more manageable than (2.4); see, e.g., Scott (1992) and also Bouerzmarni 
and Rombouts (2009) for densify estimation. 


2.2. Associated kernels for illustration 

In order to point out the importance of the type of kernel k in a regression study, 
we motivate below some kernels that will be used in simulations. These concern 
seven basic associated kernels for which three of them are univariate discrete, 
three others are univariate continuous and the last one is a bivariate beta with 
correlation structure. 


• The binomial kernel (Bin) is defined on the support = {0,1,..., x + 1} with 
X G Ti := N = {0,1, ...} and then h G (0,1]: 


^x,h{u) 


(X + 1)! 


^x + h\ (l -h 


, x+l-u 


u\{x + 1 - u)l \x + 1 \x + l 




where denote the indicator function of any given event A. Note that 
Bx,h is the probability mass function (pmf) of the binomial distribution !B{x + 
1; {x + h)/{x + 1)) with its number of trials x + 1 and its success probability 
in each trial {x + h)/{x + 1). It is appropriated for count data with small or 
moderate sample sizes and, also, it does not satisfy (2.3); see Kokonendji and 
Senga Kiesse (2011) and also Zougab et al. (2012) for a bandwidth selection 
by Bayesian method. 


For fixed arm a G N, the discrete triangular kernel (DTra) is defined on 
Sx,fl = {x, X ± 1,..., X ± fl} with X G Ti = N: 


, , (a + l)’^ -\u- 
DTx,h;a{u) = -- Is, 


P{a,h) 


'lAlnl 


(u), 


where h > 0 and P{a, h) = {2a + \){a + \) — 2 YJk=Q the normalizing constant. 
It is symmetric around the target x, satisfying Definition 2.1 and suitable for 
count variables; see Kokonendji et al. (2007) and also Kokonendji and Zocchi 
(2010) for an asymmetric version. 


• From Aitchison and Aitken (1976), Kokonendji and Senga Kiesse (2011) de¬ 
duced the following discrete kernel that we here label DiracDU (DirDU) as 
"Dirac Discrete Uniform". For fixed c G {2,3,...} the number of categories, 
we define Sx,c = {0,1,..., c - 1} and 

h 

kdUx,h-c{u) = {l-h) t{x]{u) + ^ ^ ^ ls^^\{x]{u), 

where h G (0,1] and x G Ti. This DiracDU kernel is symmefric around the 
target, satisfying Definition 2.1 and appropriated for categorical set Ti. See, 
e.g., Racine and Li (2004) for some uses. 
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• From the well known Epanechnikov (1969) kernel K^{u) = |(1 - 

we define its associated version (Epan) on Sx,h = [x — h,x + h\ with x G Ti := IR 
and h > 0: 

~ ~ ( h ) } 

It is obtained through (2.5) and is well adapted for continuous variables with 
unbounded supports. 

• The gamma kernel (Gamma) is defined on Sx,h = [0, oo) = Ti with x G Ti and 
h>0: 

( ti\ 

where r(-) is the classical gamma function. It is the pdf of the gamma 
distribution Qa{l + xjh, h) with scale parameter 1 + x/h and shape parameter 
h. It satisfies Definition 2.1 and suits for non-negative real set Tp see Chen 
(2000a). 




The beta kernel (Beta) is however defined on Sx,h = [0,1] = Ti with x G Ti 
and h > 0: 


BEx,h{u) 


xjh, 1 + (1 - x)/h) 


where ^{r,s) = T“^(l - ty~^dt is the usual beta function with r > 0 and 

s > 0. It is the pdf of the beta distribution Se{l + x/h, (1 - x)/h) with shape 
parameters 1 -i- x/h and (1 - x)/h. This pdf satisfies Definition 2.1 and is 
appropriated for rates, proportions and percentages dataset Ti; see Chen 
(1999). 


• We finally consider the bivariate beta kernel (Bivariate beta) defined by 


FSx,h(wi, W2) 


X 


e^(l + X\/h\\, 1 -i- (1 — X\)/h\\) 
Ui-]Ii{xi,hn) u 


^{\ + X2/h22, 1 -t (1 ~ X2)/h22) 

~ 7. 


1 + X 


with Sx,H = T 2 = [0,1]^ X = (xi,X 2 )"^ G T 2 and H = For j = 1,2, 

the characteristics in (2.8) are given by hjj > 0, [j^j{xj,hjj) = (xy -i- hjj)/{l + 2hj/), 
o^{xj,hjj) = {xj + hjj){l + hjj - Xj){l + 2hjj)~^{l + 3hjj)~^hjj , and the constraints 


hi2 S Fi V^ii^22 / V^n^) (2.9) 
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with jS 


max. 


VifV'i 


( vi - f/l(Xi,hii) ^ V2 - ll2{X2,}l2l) 
1 h^^^Oi{X\,hii) ^ 22 ^( 72 (^ 2 /^ 22 ) 


>-l 


and 


P' = 


mm; 




Vl 


V\,Vl 


l^ijxuhn) ^ Vi - ^i{x2,h2i) 
I ^llOl{Xi,hii) ^22^Cr2(X2/^22) 


.-1 


It satisfies Definition 2.1 and is adapted for bivariate rates. The full band¬ 
width matrix H allows any orientation of the kernel. Therefore, it can reach 
any point of the space which might be inaccessible with diagonal matrix. 
This type of kernel is called beta-Sarmanov kernel by Kokonendji and Some 
(2015); see Sarmanov (1966) and also Lee (1996) for this construction of mul¬ 
tivariate densities with correlation structure from independent components. 
Like Bertin and Klutnitchkoff (2014), the miminax properties of this bivariate 
beta kernel are also possible and more generally for associated kernels. 



Figure 2.1: Shapes of univariafe (discrete and continuous) associated kernels: (a) 
DiracDU, discrete triangular a = 3 and binomial with same target x = 4 and 
bandwidth h = 0.13; (b) Epanechnikov, beta and gamma with same x = 0.8 and 
h = 0.3. 


Figure 2.1 shows some forms of the above-mentioned univariate associated 
kernels. The plots highlight the importance given to the target point and around it 
in both discrete and continuous cases. Furthermore, for a fixed bandwidth h, the 
classical associated kernel of Epanechnikov, and also the categorical DiracDU ker¬ 
nel, keep their respective same shapes along the support; however, they change 
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according to the target for the others non-classical associated kernels. This ex¬ 
plains the inappropriateness of the Epanechnikov kernel for density or regression 
estimation in any bounded interval (Figure 2.1(a)) and of the DiracDU kernel for 
count regression estimation (see simulations below). 

2.3. Bandwidth matrix selection by cross validation 

In the context of multivariate kernel regression, the bandwidth matrix selection 
is here obtained by the well-known least squares cross-validation. In fact, for a 
given associated kernel, the optimal bandwidth matrix is H = arg min LSCV(H) 


with 



( 2 . 10 ) 


!=1 


where m_,(X;; k) is computed as of (2.4) excluding X, and, is the set of band¬ 
width matrices H; see, e.g., Kokonendji et al. (2009) in univariate case and also 
Zhang et al. (2014) and Zougab et al. (2014a) for univariate bandwidth estimation 
by sampling algorithm methods. For diagonal bandwidth matrices (i.e. multiple 
associated kernels) the FSCV method use the set of diagonal matrices !D. Concern¬ 
ing the beta-Sarmanov kernel (2.8) with full bandwidth matrix, this FSCV method 
is used under Vfi, a subset of Vf verifying the constraint (2.9) of the associated 
kernel. Their algorithms are described below and used for numerical studies in 
the following section. 

Algorithms ofLSCV method (2.10) for some type of associated kernels and their correpons- 
ding bandwidth matrices 

Al. Bivariate beta (2.8) with full bandwidth matrices and dimension d = 2. 

1. Choose two intervals Hn and H 22 related to hn and /i 22 , respectively. 


2. For 6 = 1,..., and y = 1,..., i{H 22 ), 

(a) Compute the interval Hi 2 [ 6 , y] related to h -[2 from constraints in (2.9); 

(b) ForA = l,...,^(Hi 2 [ 6 ,y]), 



3. Apply FSCV method on the set of all full bandwidth matrices H( 6 , y, A). 

A2. Multiple associated kernels (i.e. diagonal bandwidth matrices) for d >2. 


1. Choose two intervals Hn ,..., related to hn,..., h^, respectively. 

2. For 61 = 1,..., mil),6, = 1,..., mad), 

Compose the diagonal bandwidth matrix H( 6 i ,... ,bd) := Diag (Hnibi),.. ■ ,Hdd(bd)). 
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3. Apply LSCV method on the set !D of all diagonal bandwidth matrices 

For a given interval I, the notation £{T) is the total number of subdivisions of 1 
and 7(r]) denofes the real value at the subdivision t] of 1. Also, for practical uses 
of (Al) and (A2), the intervals Hu,... ,Hdd are taken generally according to the 
chosen associated kernel. 


3. Simulation studies and real data analysis 


We apply the multivariate associated kernel estimators m„ of (2.4) and (2.7) 
to some simulated target regressions functions m and then to two real datasets. 
The multivariate and multiple associated kernels used are built from those of 
Section 2.2. The optimal bandwidth matrix is here chosen by LSCV method (2.10) 
using Algorithms Al and A2 of Section 2.3 and their indications. Besides the 
criterion of kernel support, we retain three measures to examine the effect of 
different associated kernels k on multiple regression. In simulations, it is the 
average squared errors (ASE) defined as 

1 ” 

ASE{k) = - y {m{xi) - m„{xi; K)f. 


For real datasets, we use the root mean squared error (RMSE) which linked to ASE 
through squared root and by changing the simulated value m(x,) into the observed 
value yi: 


RMSE{k) = 



n 

y {i/, -m„(x,; K)f. 

i=l 


Also, we consider the practical coefficient of determination which quantifies the 
proportion of variation of the response variable Y; explained by the non-intercept 
regressor x. 


R\k) = 


Lti Wn(x,-; K)-yf 

l:Uy^ - W ' 


with y = n~^{yi -i- ... -i- 1 /„). All these criteria above have their simulated or real 
data counterparts by replacing y, with m{x{) and vice versa. Computations have 
been performed on the supercomputer facilities of the Mesocentre de calcul de 
Franche-Comte using the R software; see R Development Core Team (2014). 


3.1. Simulation studies 


N. 


Expect as otherwise, each result is obtained with the number of replications 
= 100 . 
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3.1.1. Bivariate cases 

We consider seven target regression functions labelled A, B, C, D and E with 
dimension d = 2. 


• Function A is a bivariate beta without correlation p{xi,X 2 ) = 0: 

with (pi, qi) = (3,2) and {p 2 , qi) = (5,2) as parameter values in univariate beta 
density. 

• Function B is the bivariate Dirichlet density 

m{Xi,X2) = + + xi+X2<l]iXl, X 2 ), 

T{ai)T{a2)T{a^) 

where r(-) is the classical gamma function, with parameter values Ui = n 2 = 
5, as = 6 and, therefore, the moderate value of p(xi,X 2 ) = -(nin 2 )^^^(<a!i + 
as)~^^^{a2 + as)~^^^ = -0.454. 


• Function C is a bivariate Poisson with null correlation p(xi,X 2 ) = 0: 


m{xi,X2) 


g- 52 ^ 13^2 

1 N (^ 1 ) 1 N (^2 ) • 


• Function D is a bivariate Poisson with correlation structure 


min(xi,X2) 

m(xi,X 2 ) = ^ 


!=0 


r\Xi+i QX2+1 pi 

bj 62 ^12 

(xi + z)!(x2 + iV-i\ 


T71 ]Nxn('^ 1/^2)/ 


with parameter values 0i = 2, 02 = 3 and 0i2 = 4 and, therefore, the moderate 
value of p(xi,X 2 ) = 0i2(0i + 0i2)~^^^(02 + 012)”^^^ = 0.617; see, e.g., Yahav and 
Shmueli (2012). 


• Function E is a bivariate beta without correlation p(xi, X 2 ) = 0: 

xf \l - Xi)‘'i-i3^2 

m(xi,X2) = —-—— 1 [o,i](xi)1n(x2), 

e^^{pi,qi)x2\ 

with (pi,pi) = (3,3). 


Table 3.1 presents the execution times needed for computing the FSCV method 
for both bivariate beta kernels with respect to only one replication of sample sizes 
n = 50 and 100 for the target function A. The computational times of the FSCV 
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n 

Bivariate beta 

BetaxBeta 

50 

276.198 

7.551 

100 

647.255 

30.081 


Table 3.1: Typical Central Processing Unit (CPU) times (in seconds) for one repli¬ 
cation of LSCV method (2.10) by using Algorithms A1 and A2 of Section 2.3. 


method for the bivariate beta with correlation structure (2.8) are obviously longer 
than those without correlation structure. Let us note that for full bandwidth 
matrices, the execution times become very large when the number of observations 
is large; however, these CPU times can be considerably reduced by parallelism 
processing, in particular for the bivariate beta kernel with full LSCV method (2.10). 
These constraints (2.9) reflect the difficulty for finding the appropriate bandwidth 
matrix with correlation structure by LSCV method. 


n 

Bivariate beta 

BetaxBeta 

EpanxEpan 

A 50 

^ 100 

0.4368(0.3754) 

0.1727(0.0664) 

0.4266(0.3724) 

0.1952(0.0816) 

0.7483(0.2342) 

0.6727(0.1413) 

50 

100 

1.2564(0.5875) 

0.3041(0.1151) 

1.4267(0.4024) 

0.3362(0.1042) 

1.6675(2.0353) 

1.3975(1.5758) 


Table 3.2: Some expected values of ASE(k) and their standard errors in parentheses 
with Nsim = 100 of some mulfiple associated kernel regressions for simulafed 
confinuous dafa from functions A with p(xi,X 2 ) = 0 and B with p(xi,X 2 ) = -0.454. 


Table 3.2 reports the average ASE{k) which we denote ASE{k) for three con¬ 
tinuous associated kernels k with respect to functions A and B and according to 
sample sizes n G {50,100}. We can see that both beta kernels in dimension d = 2 
work better than the multiple Epanechnikov kernel for all sample sizes and all 
correlation structure in the regressors. This reflects the appropriateness of the beta 
kernels which are suitable to the support of rate regressors. Then, the explanatory 
variables with correlation structure give larger ASE{k) than those without correla¬ 
tion structure. Also, both beta kernels give quite similar results. Furthermore, all 
ASE{k) are better when the sample size increases. 

Finally, Tables 3.1 and 3.2 highlight that the use of bivariafe beta kernels with 
correlation structure is not recommend in regression with rates explanatory vari¬ 
ables. Thus, we focus on multiple associated kernels for the rest of the simulations 
studies. 

Table 3.3 shows the values ASE{k) with respect to five associated kernels k 
for sample size n = 20,50 and 100 and count datasets generated from C and D. 
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n 

DTr2xDTr2 

DTr3xDTr3 

BinxBin 

EpanxEpan 

DirDUxDirDU 

20 

C 50 
100 

1.5e-6(2.2e-6) 

3.1e-7(6.9e-7) 

8.6e-8(1.2e-7) 

3.3e-6(4.1e-6) 

4.7e-7(9.7e-7) 

2.9e-7(3.1e-7) 

3.6e-5(9.7e-6) 

3.6e-5(7.4e-6) 

3.7e-5(4.8e-6) 

4.0e-5(3.5e-5) 

3.8e-5(2.8e-5) 

3.6e-5(2.3e-5) 

1 .6e-8(1.8e-8) 

3.7e-9(2.3e-9) 

4.1e-10(3.5e-10) 

20 

D 50 
100 

2.4e-6(2.8e-6) 

2.5e-7(3.4e-7) 

2 .6e-8(6.2e-8) 

4.5e-6(4.9e-6) 

1.8e-7(2.5e-7) 

4.8e-8(9.5e-8) 

7.1e-6(2.6e-6) 

8.1e-5(4.3e-6) 

9.3e-6(8.2e-7) 

4.2e-6(2.5e-6) 

5.1e-6(1.2e-6) 

7.2e-6(7.8e-7) 

2.7e-8(2.1e-8) 

4.3e-9(3.2e-9) 

5.3e-10(4.6e-10) 


Table 3.3: Some expected values of ASE(k) and their standard errors in parentheses 
with Nsim = 100 of some multiple associated kernel regressions for simulafed counf 
dafa from functions C with p(xi,X 2 ) = 0 and D with p(xi,X 2 ) = 0.617. 


Globally, the discrete associated kernels in multiple case perform better than the 
multiple Epanechnikov kernel for all sample sizes and correlation structure in 
the regressors. The use of cafegorical DiracDU kernels gives fhe besf resulf in 
term of ASE{k) buf DiracDU does nof suif for fhese counf dafasefs. Also, the 
discrete triangular kernels gives the most interesting result with an advantage to 
the discrete triangular with small arm a = 2. This discrete triangular is the best 
since it concentrates always on the target and a few observations around it; see 
Figure 2.1(a). The results become much better when the sample size increases. The 
values ASE{k) for regressors with or without correlation structure are comparable; 
and thus, we can focus on targef regression functions without correlation structure 
for the remaining simulations. 


n 

BetaxDTr2 

Betax DTr3 

BetaxBin 

BefaxEpan 

BetaxDirDU 

30 

E 50 
100 

3.738(1.883) 

3.978(1.404) 

3.951(1.052) 

1.966(1.382) 

2.106(1.119) 

1.956(0.806) 

3.884(1.298) 

3.683(0.833) 

3.835(0.834) 

6.361(2.134) 

7.143(1.732) 

7.277(1.574) 

0.162(0.201) 

0.138(0.171) 

0.113(0.147) 


Table 3.4: Some expected values (xlO^) of ASE{k) and their standard errors in 
parentheses with Ngim = 100 of some multiple associated kernel regressions of 
simulated mixed data from function E with p(xi, X 2 ) = 0. 


Table 3.4 presents the values for sample sizes n e {30,50,100} and for five 
associafed kernels k. The datasefs are generafed from E and the beta kernel is 
applied on the continuous rate variable of E. We observe the superiority of the 
multiple associated kernels using discrete kernels over those defined with the 
Epanechnikov kernel for all sample sizes. Then, fhe multiple associated kernel 
with the categorical DiracDU gives the best ASE{k) but it is not appropriate for the 
count variable of E. Also, fhe values ASE{k) are getting better when the sample 
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size increases. 

From Tables 3.2, 3.3 and 3.4, the importance of the type of associated kernel k 
which respect the support of the explanatory variables is proven. 

3.1.2. Multivariate cases 

Since the appropriate associated kernels perform beffer than the inappropriate 
ones, we focus in higher dimension d > 2 on regression with only suitable associ¬ 
ated kernels. Then, we consider two target regression functions labelled F and G 
for d = 3 and 4 respectively. The formulas of the functions are given below. 

• Function F is a 3-variate with null correlation: 

xf - Xi)‘?i-i2^23X3 

m(Xi,X2,X3) = -—- l[04](Xi)lN(X2)lNfe), 

e^^{pi,q-i)x2lx3l 




with {pi,qi) = (3,2). 

Function G is a 4-variate without correlation: 


m(Xi,X2, X3,X4) 


X^^ ^(1 - Xi)"?! ^X2^ ^(1 - X2)‘^^ 12^33^4 

e^^ipi, qi)^{p2, q2)x3\Xi\ 


l[0a](^l)l[0,l](^2)lN(^3)l]N(^4)/ 


with {pi,qi) = (3,2) and {p 2 ,q 2 ) = (5,2). 


n 

BetaxDTr3xDTr3 

BetaxBinxDTr3 

BetaxBetaxDTr3xDTr3 

30 

0.2501(0.1264) 

0.3038(0.1258) 

0.7448(0.5481) 

50 

0.2381(0.0661) 

0.2895(0.0162) 

0.6055(0.2291) 

100 

0.2282(0.0649) 

0.2822(0.0608) 

0.5012(0.2166) 


Table 3.5: Some expected values (xlO^) of ASE{k) and their standard errors in 
parentheses with Nsim = 100 of some mulfiple associated kernel regressions of 
simulated mixed data from 3-variate F and 4-variate G. 


Table 3.5 presents the regression study for dimension d = 3 and 4 with respect 
to functions F and G and for sample size n G {30,50,100}. The values ASE{k) show 
the superiority of the multiple associated kernels using the discrete triangular 
kernel with a = 3 over the one with the binomial kernel. Some results with respect 
to function G for an associated kernel k composed by two beta and two discrete 
triangular kernels with a = 3 are also provided. The errors become smaller when 
the sample size increases. 
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3.2. Real data analysis 

The dataset consists on a sample of 38 family economies from a US large city 
and is available as the FoodExpenditure object in the betareg package of Cribari-Neto 
and Zeilis (2010). The dataset in its current form gives not available (NA) responses 
for associated kernel regressions especially when we use the discrete triangular or 
the DiracDU kernel. Then, we extend the original FoodExpenditure dataset with its 
first 20 observations which guarantees some results for the regression, and thus 
n = 58. The dependent variable is food/income, the proportion of household income 
spenf on food. Two explanafory variables are available: the previously mentioned 
household income (xi) and the number of residents (X2) living in the household 
with 'p{xi,X 2 ) = 0.028. We use the Gamma or the Epanechnikov kernel for the 
continuous variable income and the discrete (of Figure 2.1(a)) or the Epanechnikov 
for the count variable number of residents. 

The results of the multiple associated kernels for regression are divided in fwo 
in Table 3.6. The appropriate associated kernels which strictly follow the support 
of each variable give comparable results in terms of both RMSE(k) and R^(k). In 
fact, the associated kernels that use the discrete triangular with arm a = 2 and 
3 give some R^(k) approximately equal to 64%. The inappropriate kernels give 
various results. The multiple Epanechnikov kernel and the type of kernel with 
DiracDU give R^(k) higher than 80% while the GammaxEpanechnikov gives R^(k) 
less than 50%. Then, a little difference in terms of RMSE(k) can induce a high 
incidence on the R^(k). 


Appropriate 

GammaxDTr2 

0.01409 

64.2681 

GammaxDTr3 

0.01426 

64.2708 

GammaxBin 

0.01730 

56.1091 


EpanxEpan 

GanxmaxEpan 

GammaxDirDU 

Inappropriate 

0.01451 

0.03266 

0.01278 


86.0011 

47.0181 

89.3462 


Table 3.6: Some expected values of RMSE(k) and in percenfages R^(k) of some 
multiple associated kernel regressions for the FoodExpenditure dataset with n = 58. 


Table 3.7 of the second dataset aims to explain the turnover of a large company 
by fwo proportions explanafory variables obtained by survey. The first variable Xi 
is the rate of people who like the company and the second one X 2 is the percentage 
of people who like the strong product of this company. The dataset is obtained in 
80 branch of this company. Obviously, there is a significant correlation between 
these explanatory variables: 'p{xi,X 2 ) = -0.6949. 

Table 3.8 presents the results for the nonparametric regressions with three 
associated kernels k. Both beta kernels offer the most interesting results with R^(k) 
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Xu 

X2i 

yi 

Xu 

X2i 

yi 

Xu 

X2,- 

yi 

68.1 

54.6 

0.8 

80.3 

9.1 

1.5 

78.6 

31.0 

1.6 

60.8 

4.4 

1.3 

23.1 

83.1 

2.3 

44.9 

3.2 

1.0 

34.4 

36.2 

1.2 

16.9 

90.4 

2.0 

78.2 

13.9 

1.8 

59.4 

27.5 

1.3 

9.4 

79.2 

2.8 

60.2 

35.2 

1.1 

4.7 

81.0 

2.9 

55.8 

21.9 

1.3 

65.6 

26.1 

1.6 

19.9 

97.4 

1.2 

27.5 

75.0 

2.2 

74.4 

12.6 

1.6 

20.6 

73.6 

2.4 

59.1 

12.9 

1.4 

83.5 

13.3 

1.8 

16.4 

42.9 

1.1 

2.7 

93.9 

2.4 

10.9 

83.5 

2.6 

29.9 

74.4 

2.0 

13.9 

56.9 

1.4 

27.0 

77.1 

2.2 

84.8 

26.6 

1.6 

14.0 

92.9 

2.1 

3.1 

67.0 

2.2 

46.1 

66.9 

1.2 

22.9 

43.9 

1.1 

14.8 

72.9 

2.5 

10.2 

86.3 

2.5 

53.8 

56.2 

1.0 

80.6 

16.5 

1.6 

89.4 

32.5 

1.6 

23.7 

61.5 

1.5 

64.1 

28.6 

1.5 

30.9 

46.3 

1.1 

39.6 

67.2 

1.4 

15.6 

90.5 

2.0 

24.3 

37.8 

1.2 

59.5 

45.1 

0.9 

3.9 

68.6 

2.5 

27.4 

74.6 

1.9 

17.3 

81.2 

2.6 

66.9 

43.7 

0.9 

47.7 

61.7 

1.1 

93.7 

28.5 

1.5 

1.5 

65.8 

2.3 

33.1 

83.8 

1.5 

28.7 

82.7 

2.0 

35.6 

43.7 

1.0 

0.3 

83.3 

3.0 

61.3 

70.9 

0.6 

13.9 

25.0 

0.8 

76.9 

35.4 

1.2 

67.1 

24.0 

1.7 

13.2 

70.8 

2.2 

29.5 

44.6 

1.3 

85.8 

36.5 

1.2 

34.5 

73.7 

1.8 

19.6 

67.7 

1.9 

35.5 

76.9 

1.8 

55.6 

6.9 

1.3 

96.2 

26.1 

1.7 

18.8 

55.9 

1.3 

30.7 

9.1 

0.9 

85.9 

28.0 

1.5 

50.4 

17.7 

1.4 

43.5 

15.1 

1.0 

5.6 

39.1 

1.1 

67.2 

8.7 

1.5 

31.5 

36.7 

1.2 

99.9 

7.15 

1.3 

13.1 

59.4 

1.7 

30.0 

21.5 

0.8 

61.0 

31.1 

1.4 

13.7 

75.8 

2.5 





Table 3.7: Proportions (in %) of folks who like the company, those who like its 
strong product and turnover of a company, designed respectively by the variables 
Xu, X 2 i and y,, with 'p{xi,X 2 ) = -0.6949 and n = 80. 


approximately equal to 86%. Note that, the multiple Epanechnikov kernel gives 
lower performance mainly because this continuous unbounded kernel does not 
suit for these bounded explanatory variables. 


4. Summary and final remarks 

We have presented associated kernels for nonparametric multiple regression 
and in presence of a mixture of discrete and continuous explanatory variables; see. 
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Bivariate beta 

BetaxBeta 

EpanxEpan 

0.10524 

0.10523 

0.18886 

86.6875 

86.6874 

76.3431 


Table 3.8: Some expected values of RMSE(k) and in percentages R^(k) of some 
bivariate associated kernel regressions for tunover dataset in Table 3.7 with 
^{xi, X 2 ) = -0.6949 and n = 80. 


e.g., Zougab et al. (2014b) for a choice of the bandwidth matrix by Bayesian meth¬ 
ods. Two particular cases including the continuous classical and the multiple (or 
product of) associafed kernels are highlighf with the bandwidth matrix selection 
by cross-validation. Also, six univariate associated kernels and a bivariate beta 
with correlation structure are presented and used for computational studies. 

Simulation experiments and analysis of fwo real datasets provide insight into 
the behaviour of the type of associated kernel k for small and moderafe sample 
sizes. Tables 3.1, 3.2 and 3.8 on bivariate rate regressions can be conceptually 
summarized as follows. The use of associafed kernels with correlation structure is 
not recommend. In fact, it is time consuming and have the same performance as 
the multiple beta kernel. Also, these appropriate beta kernels are better than the 
inappropriate multiple Epanechnikov. Eor count regressions, the multiple associ¬ 
ated kernels built from the binomial and the discrete triangular with small arms 
are superior to those with the optimal continuous Epanechnikov. Eurthermore, 
the categorical DiracDU kernel gives misleading results since it does not suit for 
count variables, see Tables 3.3 and 3.4. We advise beta kernels for rates variables 
and gamma kernels for non-negative dataset for small and moderafe sample sizes, 
and also for all dimension d >2; see, e.g.. Tables 3.5 and 3.6. Einally, more than the 
performance of the regression, it is the correct choice of the associated kernel ac¬ 
cording to the explanatory variables which is the most important. In other words, 
the criterion for choosing an associated kernel is the support; however, for several 
kernels matching the support, we use common measures such as the mean inte¬ 
grated squared error. It should be noted that a large coefficient of determination 

does not mean good adjustment of the data; see Tables 3.6 and 3.8. Eurther 
research on associated kernels for functional regression is conceivable; see, e.g., 
Amiri et al. (2014) for classical kernels. 
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